SUM Query Processing over Probabilistic Data
نویسندگان
چکیده
SUM queries are crucial for many applications that need to deal with probabilistic data. In this report, we are interested in the queries, called ALL_SUM, that return all possible sum values and their probabilities. In general, there is no efficient solution for the problem of evaluating ALL_SUM queries. But, for many practical applications, where aggregate values are small integers or real numbers with small precision, it is possible to develop efficient solutions. In this report, based on a recursive approach, we propose a complete solution for this problem. We implemented our solution and conducted an extensive experimental evaluation over synthetic and real-world data sets; the results show its effectiveness.
منابع مشابه
Analytics over Probabilistic Unmerged Duplicates
This paper introduces probabilistic databases with unmerged duplicates (DBud), i.e., databases containing probabilistic information about instances found to describe the same real-world objects. We discuss the need for efficiently querying such databases and for supporting practical query scenarios that require analytical or summarized information. We also sketch possible methodologies and tech...
متن کاملContinuous Probabilistic Sum Queries in Wireless Sensor Networks with Ranges
Data measured in wireless sensor networks are inherently imprecise, due to a number of reasons, and aggregate queries are often used to analyze the collected data in order to alleviate the impact of such imprecision. In this paper we will deal with the imprecision in the measured values explicitly by employing a probabilistic approach and we focus on one particular type of aggregate query, name...
متن کاملEfficient SUM Query Processing over Uncertain Data
SUM queries are crucial for many applications that need to deal with probabilistic data. In this paper, we are interested in the queries, called ALL_SUM, that return all possible sum values and their probabilities. In general, there is no efficient solution for the problem of evaluating ALL_SUM queries. But, for many practical applications, where aggregate values are small integers or real numb...
متن کاملQuery Processing over Uncertain Data
An uncertain or probabilistic database is defined as a probability distribution over a set of deterministic database instances called possible worlds. In the classical deterministic setting, the query processing problem is to compute the set of tuples representing the answer of a given query on a given database. In the probabilistic setting, this problem becomes the computation of all pairs (t,...
متن کاملEfficient Query Evaluation over Temporally Correlated Probabilistic Streams
Many real world applications such as sensor networks and other monitoring applications naturally generate probabilistic streams that are highly correlated in both time and space. Query processing over such streaming data must be cognizant of these correlations, since they significantly alter the final query results. Several prior works have suggested approaches to handling correlations in proba...
متن کامل